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Abstract 



• ^ The recently introduced approach for Encrypted Image Folding is generalized to make 

rS it Self Contained. The goal is achieved by enlarging the folded image so as to embed all 

the necessary information for the image recovery. The need for extra size is somewhat 
compensated by considering a transformation with higher folding capacity. Numerical 
examples show that the size of the resulting cipher image may be significantly smaller 
than the plain text one. The implementation of the approach is further extended to deal 
also with color images. 



1 Introduction 

As cameras and digital scanners of very high resolution are becoming widely available, use 
of high resolution digital images is becoming part of everyday life. From a mathematical 
standpoint a digital image is a 2D data array, say / G ]R^^><^^. Each data point is referred to as 
a pixel. For a gray level image, each pixel is represented with an intensity value /. For a RGB 



1 



representation of a color image, each pixel consists of a color triple {Ir^ Iq^ Ib) representing the 
intensity of the red, green and blue components, respectively. 

The array of pixels used to represent a high resolution digital image is expected to be huge. 
Obviously storage and transmission of this raw data is impractical. Consequently, a reduction 
in data dimensionality is essential. The process that creates a compact data representation is 
called compression. Because of the nature of its informational content compressing an image 
usually involves special techniques. As opposed to binary files where a single bit error may 
destroy the whole piece of data, some distortion is usually tolerable even when compressing 
high quality images. This is because the visual perception of the image is more important than 
the exact pixel values. 

The most frequently applied image compression techniques involve transform coding which 
has three main steps: i) Application of an invert ible transform to the intensity image, ii) Quantization 
of the transformed data, iii) Bit-stream coding. 

The familiar compression standard JPEG, for instance, implements step i) using the Discrete 
Cosine Transform (DCT), while the more recent, JPEG2000, uses Discrete Wavelet Transform 
(DWT). 

Another problem associated with the transmission of digital images is security. It comprises 
several aspects, including confidentiality and access control which are addressed by encryption. 
This implies that only parties holding decryption keys can access content of an image. Conven- 
tional image encryption is based on techniques developed for general data (ll[2]. In principle 
generic encryption can be applied to a digital image before or after compression. However, 
encryption before compression would change the statistical properties of the image preventing 
compression from being applied successfully. 

On the other hand, as well as eflFecting the compression performance, direct encryption of 
the compressed data results in an bit stream that is incompatible with the original image file 
format. Less stringent schemes involve partial (or selective) encryption Q,[2]. However the 
security of these encryption systems is lower when compared to full encryption. 

A diflFerent approach to image encryption is based on Chaotic Cryptography [3]-[Toj. An 
interesting critical analysis of the research in this area can be found in ^Tj. The connection 
between chaotic and conventional cryptography is considered in |9|. 

Chaos based image encryption takes advantage of the extreme sensitivity to initial conditions 
of some dynamical systems, to control the 'confusion' of pixels in an intensity image. 

Thus, a chaotic method breaks the structure of the plain text image, producing a cipher 
image which is no longer compressible by conventional transform coding techniques. Hence, 
within the chaos based framework for image encryption the problem of storage and transmission 
of large images is currently unsolved. 

An alternative framework, involving only mathematical operations on an intensity image, 
but addressing simultaneously the problems of data reduction and encryption, has been recently 
introduced in |12|. The scheme is termed Encrypted Image Folding (EIF). The first step of this 
new scheme diflFerers from step i) in the above mentioned conventional compression scheme in 
that, instead of using orthogonal transformations (e.g. DCT or DWT) the transformation is 
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realized by means of highly nonlinear approximation techniques. This increases the difficulty of 
the approximation process but at the same time renders significant improvement in the sparsity 
of the image representation. 

Quantization and data reduction are achieved simultaneously by embedding some of the 
transformed data into a section of the image. Privacy is protected by granting access to the 
embedded data only to key holders. 

The underlying principle of the proposed framework is very simple: Suppose that an image 
is given as an intensity array / G M^^^^^ and suppose also that, through a transformation 
B : M^^><^?^ one can approximate equivalent information from an array c G obtained 

as c = BL If K < NxNy by a considerable amount, c is said to be a sparse representation of 
the image /. It follows then that a suitable transformation to achieve sparsity should be rank 
deficient, with an associated null space, null(5), of large dimensionality. Such a transformation 
creates room for storing covert information. Indeed, if one considers an element F G null(5) 
and adds it to the image, so as to create a new array G = I + one obtains the identical 
representation BG = BI = c. The sparser the representation of an image, the larger the null 
space of the associated transformation. Consequently, the first part of this effort focuses on 
the design of an affective transformation for this purpose. The transformation is adaptively 
constructed by the greedy selection strategy called Orthogonal Matching Pursuit (OMP). 

The viability of EIF, as proposed in |12|, stems from the possibility of processing a large 
image by dividing it into small blocks. This allows the representation of some of the blocks to 
be embedded into other blocks, realizing in that manner the folding of the image. However, 
the technique in |12| is not self contained, because, in addition to the folded image, extra in- 
formation is required at the unfolding step, and that information depends on the image. In 
this Communication we propose to extend EIF so as to make it self contained. We term such 
an extension Self Contained Encrypted Image Folding (SCEIF), because all that is needed to 
successfully unfold the image is the private key. This goal is accomplished by enlarging the 
folded image to create further space for the required information. The need for extra size is 
compensated by considering a transformation with the capability of yielding sparser represen- 
tations than that in |12|, therefore improving folding capacity. Access control to the folded 
image is realized using a simple symmetric key encryption algorithm. The whole procedure is 
characterized by its potential for real time implementation using parallel processing, but also 
for its competitiveness using sequential processing. 

The paper is organized as follows. In Sec. [2] we discuss the strategy for achieving high level 
of sparsity in image representation using the greedy selection strategy OMP, implemented here 
in 2D with separable dictionaries. The framework for extending EIF to SCEIF is discussed in 
Sec. [3] and illustrated in Sec. [i] by its application on i) an astronomical image created at the 
European Southern Observatory ii) a photograph of the natural world provided by National 
Geographic. Remarks on the quality and security of the recovered images are given in Sec. [5| 
Conclusions and final remarks and are summarized in Sec. |6l 
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2 Sparse Image Representation 



The approach to be introduced in the next section rehes on the abihty to design a specific 
transformation which gives rise to a sparse representation of an image. This section is dedicated 
to the construction of such a transformation. 

Suppose that an image, given as an array / G M^^^^^ of intensity pixels, is to be approxi- 
mated by the hnear decomposition 



K 



J]cfc^4, (1) 



k=i 



where each Ck is a scalar and each d^^ is an element of M^^><^^ to be selected from a set, 
V = {dn}n=n called a 'dictionary'. 

A sparse approximation of / G M^^^^^ is an approximation of the form ([T]) such that the 
number K of elements in the decomposition is significantly smaller than = N^Ny. Clearly 
one of the crucial issues to achieve high levels of sparsity is the selection of the right elements to 
decompose the image. This goal has motivated the introduction of highly nonlinear techniques 
for image approximation, which operate outside the traditional basis framework. Instead, 
the terms in the decomposition are taken from a large redundant dictionary, from where the 
elements d^^ in ([T]), called 'atoms', are chosen according to some optimality criterion. 

Within the redundant dictionary framework for approximation, the problem of finding the 
sparsest decomposition of a given image can be formulated as follows: 

Approximate the image by the ^atomic decomposition^ ([T]) such that the number K of atoms 
is minimum. 

Equivalent ly, for a dictionary of M > A elements the statement is reworded as: 
Find the atomic decomposition: 



M 

tK 



^Cndn, (2) 

n=l 

such that the counting measure ||c||c,=o •= Yln=ii^^T minimized. 

Unfortunately the numerical minimization of ||c||c,=o restricted to ^ involves a combina- 
torial problem for exhaustive search and is therefore intractable with classical means. Hence, 
one is forced to abandon the sparsest solution and look for a 'satisfactory solution', i.e, a solu- 
tion such that the number of nonzero coefficients in ^ (equivalently, the number of X-terms 
in ([T])) is considerably smaller than the image dimension. One possibility for constructing a 
solution of this nature could be to fix a value of a G (0, 1] and minimize the diversity mea- 
sure, Yl!k=i \^k\^ ll^], closely related to the a-entropy giving rise to the non-extensive statistical 
mechanic [l6|[l7]. However, the numerical implementation of this possibility is too demanding 
to apply in the present context. On the contrary, the goal of finding a sparse solution can be 
achieved at speeds comparable to fast transforms by the greedy technique called OMP that we 
dedicate to be applied in 2D. This approach selects the atoms in the decomposition ([T]) in a 
stepwise manner, as will be described in the next section. 
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2.1 Orthogonal Matching Pursuit in 2D 

OMP was introduced in |18|. We describe here our implementation in 2D, henceforth refereed to 
as 0MP2D. Our version of the algorithm is specific to separable dictionaries, i.e, a 2D dictionary 
which corresponds in effect to the tensor product of two ID dictionaries. The implementation 
is based on adaptive biorthogonalization and Gram-Schmidt orthogonalization procedures, as 
proposed in |19|. However, the optimized selection proposed in [19| is not considered here, due 
to the computational demands of such a selection process. 

The images we are concerned with are assumed to be either gray level intensity images or 
color images stored in a standard RGB format. This format stores three color values, R(Red), 
G(Green) and B(Blue), for each pixel. Hence, the color image is given as three independent 2D 
arrays, each called a 'channel'. We represent the RGB channels as the arrays 1^ G M^^^^^, z = 
1,2,3 (a gray level intensity image can be considered a particular case of this representation 
corresponding to a unique index z = 1). 

Given an RGB image I, G M^-^^^ z = 1, 2, 3 and two ID dictionaries = {D^ G M^^j^i 
and = {D^^ G M^^j^l^ our purpose is to approximate the arrays G ]R^^>^^^, z = 1, 2, 3 
using common atoms for the three images. More precisely, for i = 1, . . . , A^a^ and j = . . . ^ Ny 
we look for approximations of the form 

K 

= J2<D^'^(')DeyJj), z = 1,2,3. (3) 

n=l 

Notice that, while the coefficients in the above decomposition depend on the image 7^, the 
atoms participating in the decompositions are common to all the channels. For selecting those 
atoms we adopt the OMP selection criterion extended to simultaneous decomposition of signals. 
A discussion of this criterion can be found in ^20j, in our context is implemented as follows: 
On setting i?^ = z = 1, 2, 3 at iteration k+1 the algorithm selects the atoms D^x^^^ G 
and D^y^^^ ^ that maximize over z the absolute value of the Frobenius inner product 



n'MFl, n = l,...,M,,m = l,...,My, i.e., 



3 N^,Ny 

n+i,^Vi = argmax J]| D^^m'ih j)Dl,U)l 



.yr Z=l i=l 
171 = 1,..., My 



with 



(4) 



R^j) = -Y,<m^S^Dl^{j), z = 1,2,3. 



n=l 



The three sets of coefficients c^, n = 1, . . . , A; involved in ^ are such that ||i?^||F is minimum 
for each z. (|| • ||f being the Frobenius norm). This is guaranteed by calculating the coefficients 



d, z = 1, 2, 3 as 



<=(5^,/,)f, n=l,...,fc, (5) 
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where matrices 5^, n = 1, . . . , fc, are recursively constructed at each iteration step as indicated 
in Appendix A. 

The algorithm iterates up to step, say X, for which, for a given p, the stopping criterion 
ELi 11^^ - IIf < P is met. The MATLAB function for the implementation of the 0MP2D 
approach on multiple 2D signals, which we have called 0MP2DM1, is available from |21|. The 
corresponding MEX file in C++, for faster implementation of the identical function, is also 



available from 21 



2.2 Constructing the dictionary 

The other crucial design for the success in finding a 'good enough' sparse representation of the 
form ([3]) is the dictionary which provides the possible choices of atoms at the selection step. 

The mixed dictionary used in \l2\ for this purpose consists of two components for each ID 
dictionary: 

• A Redundant Discrete Cosine dictionary (RDC) as given by: 

with i = 1, . . . , Mx normalization factors. For = Nx this set is a Discrete Cosine 
orthonormal basis for the Euclidean space M^^. For Mx = 21 Nx^ with / G N, the set is a 
RDC dictionary with redundancy 2/, that will be fixed equal to 2. 

• The standard Euclidean basis, also called Dirac basis, i.e. 



Now we include an additional component: 

• A family of cubic B-spline dictionaries of different support, as proposed in [22], but 
discretizing the domain by taking the value of a prototype B-spline only at the knots 
and translating that prototype one point at each translation step. Each B-spline based 
dictionary is given as 

V^, = K5^0- - t)\N,;j = 1, . . . , iVj£f , 

where the notation B^{j — i)\Nx indicates the restriction of the B-spline of order m, center 
at the point i, to be an array of size A^^^. Cubic splines are obtained setting m = 4. The 
factors i = 1, . . . , Mx^ are normalization constants, with the number of atoms in 
the dictionary s. The values of s to be considered are s = 3 and 5 = 4, which label the 
dictionaries arising as translation of a prototype B-spline having, respectively, 3 and 7 
points of nonzero value (see Fig. fl]). 
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Figure 1: The discrete prototype cubic B-splines of supports 3, and 7, generating (by translation at 
every point) the dictionaries and Pf, respectively. 

The complete ID dictionary is constructed as = uj^iV^. The dictionary is built in 
equivalent fashion, but changing A^^; to A^^ and Mx to My when applicable. 

The required 2D dictionary is formed as P = P^. However, it is not necessary to store 
the 2D dictionary P, since the algorithm takes advantage of the separability inherent in its 
construction. This advantage significantly reduces storage demands and extends the possibility 
of using the OMP approach in 2D. 

It is time now to examine closely the term 'good enough' for a sparse decomposition. Within 
the present context by the term good enough we mean a decomposition that a) increases sparsity 
well beyond the levels attained by such techniques as DCT or DWT, and b)requires comparable 
computational time. 

Remark 1. The suitability of the mixed dictionary for block processing is essential in fulfilling 
requirements a) and b) above, i.e., for processing an image by dividing it into small blocks 
and approximate the blocks independently. This feature renders the complexity of the highly 
nonlinear, and otherwise costly selection technique, linear in terms of the number of blocks 
employed in decomposing the image. 

The capacity of the dictionary based approach to achieve a satisfactory sparse approximation 
of an image will become clear when illustrating the SCEIF technique in Sec. [4| In addition, we 
present some comparisons on the results on standard test images which are listed in the first 
column of Table [T] All the images are 8 bit gray level intensity images of 512 x 512 pixels. For 
the actual processing we divide each image into blocks of 8 x 8 pixels and process the blocks 
independently. The approximated blocks are then assembled to give the approximated image. 
Sparsity is measured by the Sparsity Ratio (SR) defined as 

total number of pixels 
total number of coefficients 

In all the cases the number of coefficients is determined as the one required to produce a high 
quality approximation with no visual deterioration with respect to the original image, in this 



Image 


Dictionary 


DCT 


DWT 


Barbara 


5.02 


3.10 


2.94 


Boat 


4.61 


2.61 


2.60 


Bridge 


3.24 


1.79 


1.86 


Film Clip 


5.86 


3.29 


3.34 


Lena 


6.51 


3.81 


4.04 


Mandrill 


2.85 


1.64 


1.64 


Peppers 


5.23 


2.88 


2.96 



Table 1: Comparison of the Sparsity Ratio (for PSNR 43dB ) achieved by the mixed dictionary 
(second column) and that yielded by DCT and DWT (3rd and 4th columns respectively). The first 
column lists the names of the popular test images where the approaches are compared. 



case corresponding to a PSNR of 43dB (c.f. ([16])). The sparsity results achieved by selecting 
atoms with 0MP2D, from the proposed mixed dictionary, are displayed in the second column 
of Table [T] The third column shows results produced by the DCT implemented using the 
same blocking scheme. For further comparison the results produced by the Cohen-Daubechies- 
Feauveau 9/7 DWT (applied on the whole image at once) are displayed in the last column of 
Table [l} Notice that while for the fixed PSNR of 43dB the DCT and DWT approaches yield 
comparable SR, the corresponding SR obtained by the mixed dictionary, for all the images, 
is significantly higher. What is of paramount importance to our current interest is that the 
processing time is very competitive. The actual speed of the approximation depends, of course, 
on the sparsity of each image. For the set of images in Table [T] the mean SR is 4.76 and the 
mean processing time is 1.72 seconds per image (average of ten independent runs in MATLAB 
environment implemented in a 14" laptop of 2.8 GHz processor and 3GB RAM). 

3 Self Contained Encrypted Image Folding 

The idea of using the null space of a transformation for storing information in encrypted form 
was first outlined in [13] and further discussed in fli]. However, it has been only recently 
materialized as the EIF application ^I2j. The denomination is meant to refiect a particular 
feature; the space created by a sparse representation of an image is used to store part of the 
image itself, thereby reducing the original image size. 

As already stated, we process each image z = 1,2,3 by dividing it into, say blocks 
Iz^q^ q = 1,...,(3, which without loss of generality are assumed to be square of A^^ x Nq 
intensity pixels. For a fixed g-value the three blocks of intensity arrays 7^^^, z = 1,2,3 (each 
of which corresponds to a color channel) are simultaneously approximated using the dictionary 
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V = 1)^ ®T)y , as given in Sec |2.2[ by the atomic decomposition 

4'y<J = E<''^k^k' 9 = 1,---,Q, ^ = 1,2,3 (6) 



2.1 



where D^^q and are the atoms that have been selected through the approach of Sec 

and span a subspace V^^ = span{L)^^g ® Djyq C M^'?^^^. 

For ([g]) to be a sparse approximation of I^^q the number of Kg terms should be considerably 
smaller than A^^. In other words, the dimension A^^ — Kg of the orthogonal complement of 
in which is indicated as V;^^, should be significant in relation to A^^. In line 

with 



12 



the subspace V;^^ is used to embed a part of the image in another part of the image, 

as described below. The approximated image = \J^^^lf^q z = 1, 2, 3 is the plain text and the 
cipher is the folded image. 



3.1 Folding Procedure 

A number of, say 3i/, blocks are kept as 'hosts' for embedding the coefficients of the remaining 
3{Q — H) equations ([6]). For this first the coefficients c^'^, n = 1, . . . , Kq^ q = . . . z = 

1, 2, 3 are relabeled to became the components of vectors (/^i'^, . . . , h^£^)^ q = 1, . . . , i7, z = 
1,2,3, each of length Lq = — Kq. These vectors are embedded in the 3H host blocks. 



according to the procedure given in 12 , as follows 



• For each value of q and z build a block of pixels F^^q G M^'?^^'? as 

La 



^.,. = E^'''^'''' g=l,...,//,z=l,2,3 (7) 

i=l 

where U^'^ G M^'?^^'?, i = 1, . . . , is an orthonormal basis for V;^^ obtained as follows: 
a) Using matrices Y^'^ G M^-?^^'?, i = l,...,Lg randomly generated, with a public 



initialization seed, and the already constructed projector Pv^g (c.f.( A.2[ )), for q = 
1, . . . , i7 and z = 1, 2, 3 compute the matrices O^'^ as 

0!'' = Yr-Py,YreWj„ t = l,...,L,. (8) 

b) Transform these matrices using a random rotation Ilkey initialized with a private 
key to obtain 

Xf''^ = flkeyOf ^ Z = 1,...,L,. (9) 
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b) For each z and q use an orthonormalization procedure, that we indicate by the 
operator Orth(-), to orthonormahze matrices X^^'^, i = 1, . . . ,Lg, and have the or- 
thonormal basis 

m^'tU = Orth(X^'^ i = l,...,L,), q = l,...,H,z = l,2,3 (10) 
to be used in ^ for embedding the coefficients of the remaining blocks /ifg, q = 

(i/ + i),...,g, ^ = 1,2,3. 

• Fold the image by the superpositions Gz^q = I^,q + i^^,^, q = z = 1,2, 3 and 
subsequent composition Gz = U^^iGz^q^ z = 1, 2, 3. 

3.1.1 Making the approach self contained 

Knowledge of the coefficients in ^ is not enough to reconstruct the blocks I^^q ,g = l,...,(5, ^ = 
1, 2, 3. For each g- value it is also necessary to know the indices of the atoms in the decomposi- 
tion. This matter is not considered in p!2|. A contribution of this effort is the generalization of 
the previous approach to deal with the storage of indices as well. The present proposal consists 
of creating some 'ad hoc' blocks to embed the required indices. Without loss of generality the 
blocks are assumed to be square containing Nq x Nq intensity pixels. Using any normalized to 
unity atom, say Aq G M^^^^'?, the ad hoc intensity arrays Iq G M^'?^^'?, q = 1, ... 77 are created 
as 

i, = K,Ag, q = l,...,H, (11) 

and Lq = — 1 indices are embedded in the orthogonal complement (with respect to M^'?^^^^) 
of the subspace spanned by the single atom Aq. The embedding procedure is equivalent to that 
for embedding the coefficients, i.e., 

• For q = 1, . . . , i7 using a public initialization seed generate the random matrices 1^, i = 
1, . . . , to calculate the matrices Oi as 

di = Y^-A,{A,,Y^)F, ^ = 1,...,Z,. (12) 

• Transform these matrices using a random rotation initialized with the private key to 
obtain 

X/ = nkeyO., Z = 1,...,Z,. (13) 

• For each g- value use the orthonormalization procedure Orth(-) to orthonormahze matrices 
X/, i = 1, . . . , Lg, to have the orthonormal basis 

{[/f}t\ = Orth(Xf, ^ = 1,...,L,), q = l,...,H, (14) 

needed to embed the indices. For this first map each ordered pair of indices (n, m), n = 
1, . . . , M^, m = 1, . . . , (which label the 2D dictionary atoms) to the single label 
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n = 1, . . . , M^M^. Now the steps for embedding the indices of the atoms in /^^g , q = 
1, . . . , (5 (c.f. ([g])) parallel those for embedding the coefficients. Arrange the indices to be 
components of vectors (/i^, . . . , /i| ), g = 1, . . . , i7. For each g- value, use the corresponding 

vector to generate the block of pixels Fq G M^'?^^'? as 

^. = E^'^'' Q = h---.H. (15) 

i=l 

• Now 'fold' the ad hoc blocks by the superpositions Gq = Iq + Fq^q = 1,...,^ and 
subsequently produce the composition G = U^^G^ to be split into 3 channels G^, z = 
1,2,3. 

The folding process finishes by joining the folded channels G^, z = 1, 2, 3 and the ad hoc ones 
G^, z = 1, 2, 3 to create the single folded RGB image /foidedz? ^ = 1, 2, 3 as 

^foldedz = G^ U Gz, ^ = 1, 2, 3. 

This image is now endowed with all the information that is needed to recover the approximation 
of the original image. 

Note: Parameters, such as the original image dimensions, which would normally be placed 
in the header are added as pixel values in the last row of the folded image. 

3.2 Recovering Procedure 

At this stage the approximation = \J^^-ylf^q^ z = 1, 2, 3 of the RGB image z = 1, 2, 3 is 
recovered from the folded RGB image /folded^? ^ = 1? 2, 3 by following the steps below. 

• Separate /foidedz i^to G^, z = 1, 2, 3 and G and these into the blocks G^^^, g = 1, . . . , i7, z = 
1, 2, 3 and G^, g = 1, . . . , i7. 

• Obtain Kq^ q = 1, . . . , ^ from the inner products {Aq^ Gq)^ = Kq^ q = 1, . . . , ^ (the 
remaining ones, X^, g = i7 + l,...,(5, can be hidden in some additional ad hoc blocks 
or just given as plain text intensity pixels). 

• Obtain Fq as Fq = Gq — KqAq^ g = 1, . . . , i7. 

• Recover the indices {h\^ . . . , /i ~ ), g = 1, . . . , i7 as 

and map them back to the arrays of ordered pairs {(^^^, Q = 1, . ■ ■ , Q. 
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• Obtain /i^^^ g = 1, . . . , i7, z = 1, 2, 3 from G^^q as if^^ = pYK^Gz^q, q = 1, . . . z = 
1, 2, 3 and F^^q as F^^q = G^^q - if^^. 

• Recover vectors (/^i'^, . . . , )' ? = 1, • • • , ^ = 1, 2, 3 as 

and regroup them back to get the original arrays of coefficients {c^'^j^Ii, q = {H + 
l),...,Q,z = l,2,3. 

• Use the recovered indexes and the recovered coefficients to compute /ifg, q = {H + 
1),...,(5, z = l,2,3asin ^ and reconstruct the approximated RGB image as 

J-K _ ^ p jKq 1 o o 



4 Numerical Examples 

In this section the SCEIF approach is iUustrated with two examples both involving an RGB 
color image. 

The picture at the bottom of Fig. 2 is an image of the nebula NGC 2264 created at the 
European Southern Observatory (ESO) [23]. The resolution of this image is 1464 x 1280 pixels 
per channel. The 2D intensity arrays, one for each channel, are the three pictures right above 
the color one. In order to apply SCEIF firstly each channel is divided into small blocks of 8 x 8 
pixels. The blocks are approximated using the mixed dictionary of Secj2.2| and the approach 



of Sec. 2.1, The approximation is of high quality. This is ensured by using two measures on 



the whole color image: a high PSNR (42.5 dB) and a high Mean Structural Similarity Index 



(0.997) [25j (further comments are given in Sec. 5.1). Each channel in Fig. 2 is folded and 
reshaped to produce a single RGB image. The later is the small picture at the top of Fig. 2. 
Notice that the size of such an image is 'extra small' (120 x 1280 x 3 pixels) in comparison to 
the original (1464 x 1280 x 3 pixels). This is because the representation of the full image by 
the proposed mixed dictionary is very sparse. The SR for the image is 17.42. 

Assuming now that the folded image is given to a partner stored in the original 16-bit RGB 
format, in order to recover the image the receiver should proceed as follows: first the header 
information is read. This is not encrypted and is required by the receiver to separate the image 
components G and G, and to reconstruct the three independent folded images, displayed in the 
3rd row (from the top) of Fig. 2. 



Now the process continues, as prescribed in Sec. |3.2| to recover the channels. The images 
in the 4th row of Fig. 2 depict the recovered channels using the correct private key, shown 
together as an RGB image in the last row. Because the authorized key is used, the recovery 
was successful. Fig. 3 illustrates the identical process using the incorrect private key. As 
a second example we proceed as before, but on a close up of the spider web photo, kindly 
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Figure 2: The color image represents a high quality approximation (PSNR 42.5 dB) of an image of 



the nebula NGC 2264. Credit ESQ [23]. The smah picture at the top is the RGB folded image. The 
one right below is the part containing the indices. The three small pictures in the next row are the 
folded channels (each of which contains coefficients of plain text representation of that channel). The 
three larger pictures are the channels recovered from the previous ones. The bottom picture is the 
recovered RGB image. The recovery is successful because it was realized with the authorized key. 
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Figure 3: Unsuccessful attempt to expand the image NGC 2264 of Fig. 2 using an incorrect key. 
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Figure 4: Same description as in Fig. 2 but the image is a close up of a spider web in Australia. 
Courtesy of National Geographic. Photograph by Darlyne Murawski [24] . 
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Figure 5: Unsuccessful attempt to expand the spider web image of Fig. 4 using an incorrect key. 

rendered by National Geographic |24|. There is a difference with the previous case in that, 
instead of giving free access to the correct number of atoms per block g = i7 + 1, . . . , in 
this example those numbers are also hidden, together with the indices. The reason being that 
because of the contrast between the blocks containing the web and the rest of the blocks, those 
numbers give some information about the image. Certainly, by knowing only those numbers 
one can tell that the image has a very smooth background with some details only where the 
spider web is located. This gives some visual information that one may want to avoid by hiding 
those numbers. 

The folded image reduces the size of the original spider web photo (512 x 792 x 3 pixels) 
less than in the previous case (89 x 792 x 3 pixels) because the SR is smaller: 7.95. 

For comparative purposes we have implemented the SCEIF method using DCT, which is 
also suitable for block processing. The implementation of the folding and encryption steps are 
exactly the same, the only difference is that the approximation can be performed by DCT, which 
is straightforward and faster than with the dictionary. However, since the sparsity achieved by 
DCT is lower (SR= 10.06 for nebula image and SR = 4.23 for the spider web) the corresponding 
folded images are larger (see Fig. 6). In addition, because the processing time is dominated by 
the actual folding and expanding procedures, SCEIF implemented with the mixed dictionary 
is faster than with DCT (see Table |2|). 
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Figure 6: The first picture in the top hue is the folded image (size 120 x 1280 x 3) of the nebula NGC 
2264 (size 1464 x 1280 x 3) with the proposed dictionary. The second picture in the top line is the 
folded image (size 202 x 1280 x 3) with DOT. The pictures in the bottom line, sizes (89 x 792 x 3) 
and (165 x 792 x 3), are the folded images with the dictionary and DOT, respectively, of the spider 
web image (size 512 x 792 x 3). 





Running times (in sees) 


Approximation 


Folding 


Expanding 


Total 


Nebula 


Dictionary 


10.9 


10.7 


13.7 


35.3 


DCT 


Disregarded 


17.3 


20.3 


37.6 


Spider web 


Dictionary 


4.9 


4.7 


5.6 


15.2 


DCT 


Disregarded 


7.8 


8.9 


16.7 



Table 2: Comparison of the folding and expanding times (average of five independent runs) with the 
mixed dictionary and DCT. The test was performed with MATLAB using a 14" laptop equipped with 
2.8GHz processor and 3GB RAM. As the implementation of the approximation with DCT was not 
optimized, the approximation times are not included in the calculation of the total execution time 
with this approach. The approximation with the dictionary was realized using a MEX file in C++ for 
implementing 0MP2DM1 to approximate the three channels simultaneously. 
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5 Quality and security issues 



Concerning the quality of the recovered image there are two independent aspects to be discussed. 
One is the quahty of the approximation, of the original image / and the other is the quality 
of the recovery of . 

The security matters that will be discussed are restricted to key sensitivity and resistance 
to plain text attack. 

5.1 Quality 

The quality of the approximation, of the image / is to be decided before hand. In the 
examples we have considered high quality approximations. This is assessed by two standard 
measures. One is the PSNR, which is defined as 

PSNR=-101og,o(^^^^), (16) 

where /b is the number of bits used to represent the intensity of the pixels and 

II r _ tK\\2 
MSE= ^-=^'''^ J' 

ZN^Ny 

with Z = 1 for a gray level image and Z = 3 for an RGB image. 

In the two numerical examples of Sec. [i] the corresponding PSNR is high enough (42.5 
dB) to secure approximations of high quality (with no visual degradation with respect to the 
original image) . The other measure we have used to asses the quality of the approximate image 
is the Mean Structure Similarity index (MSSIM) |25], which for two identical images is equal to 
one. The MSSIM index between the original image and the approximation, in both examples 
of Sec. [4| is larger than 0.99. This value complements and confirms the quality indicated by 
the PSNR. 

Once the desired quality of the approximated image has been fixed, that approximation 
becomes the plain text image to be folded and encrypted. Thus, the next goal is to recover the 
approximate image with high fidelity. The recovering would be 'exact' if not for the quantization 
step which is introduced to store the folded image using integers. The present version of the 
proposed scheme works with images stored using 16 bits per channel. At this precision, in both 
examples, the MSSIM index between the image recovered with the right key and the plain text 
image is equal to one. The PSNR between the authorized recovered image and the original 
image is identical to that between the plain text image and the original one. 

5.2 Security 

The security of the encryption scheme we have adopted relies on the random number generator. 
The more reliable the random generator is the safer the encryption procedure. Our implemen- 
tation uses a simple 32-bit pseudo random number generator but, apart from the convenience 
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of having it at hand, there is no reason for using that particular one. On the contrary, the 
recommendation is, of course, to use a true random number generator if that is possible. 

While the key space for the present implementation is 2^^, simply by making access to the 
order of orthogonalization private (c.f. (10) and ([T4|) the key space would be expanded. 

Key sensitivity: The high sensitivity against small variations in the private key is illustrated 
by Figures 3 and 5. The failed recovery shown in those figures were attempted using a key 
diflFering only by one digit with the correct one. The private key is 1234567891 and the tested 
key 1234567890. The PSNR between the plain text image and the recovered image with the 
wrong key is 10.8dB for the image of Fig. 3 and 9.15 dB for the image of Fig. 5. This sensitivity 
was verified statistically by repeating the experiment with 100 keys differing in only one digit 
to the correct key. The mean value of the resultant PSNR for the nebula image is 10.68dB 
with standard deviation 1.23. For the spider web image the mean value PSNR is 8.6dB with 
standard deviation 1.41. 

Prevention of plain text attacks: In order to avoid repetitions of the encryption operators 
(c.f. (10) and ([T4|)) the random arrays ([s]) and (12) should be guaranteed to be different every 
time the procedure is executed. That is the role the public initialization seed plays at the 
folding step. The seed can be set automatically, for instance as the date and time right before 
the vectors are generated. Thus, the non linearity of the operation Orth(-) prevents an attacker 
from inverting the system of equations ^ and (15) using correctly decrypted plain text images. 

Notice that the public seed ensures that even identical plain text images produce different 
cipher images. Such a feature allows us to disregard the consideration of matters regarding 
diffusion, which requires that minor changes in the plain text image produces major changes 
in the cipher image. 



6 Conclusions 

The recently introduced EIF approach has been extended to SCEIF by introducing the following 
features: 

• The folding capacity of the approach has been improved by considering a new dictionary 
for the approximation. 

• The approach is now self contained. All that is required to recover the plain text im- 
age is the folded (cipher) image and the private key. This is achieved by enlarging the 
folded image creating ad hoc blocks to place the indexes of those dictionary's elements 
participating in the image approximation (plain text image). 

• The implementation has also been extended from gray level to color images. 

The success of the approach is based on two fundamental and related features: One is the 
possibility of reducing the data dimensionality by a powerful highly non linear transformation. 
The other is the possibility of implementing the approach in an affordable period of time. The 
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proposed dictionary plays a central role in ensuring both features, by allowing for processing 
obeying a scaling law. Certainly, the fact that the approximation of a large image can be 
realized by dividing it into small blocks is the key of the current effective implementation. 
It should be emphasized that the numerical examples have been realized on a small laptop in 
MATLAB environment. Simply by implementing the method in a programming language, such 
as C or Fortran, the folding and expanding times given in Table [2] could be reduced up to ten 
times. In addition, there is room for straightforward implementation by parallel computing if 
those resources are available. 



Final Remarks 

• The scope of SCEIF is to fold an image in encrypted form. The size of the astronomical 
image is reduced 12.2 times (pixel wise) and the spider web 5.75 times. We are not consid- 
ering here any further compression stage, which could imply to convert the folded image 
into a bit stream. It should be stressed that, in order to do that, the encoding technique 
should be especially conceived to deal with the type of data that SCEIF generates by 
folding the image. 

• The simple symmetric key encryption procedure considered here leaves room for straight- 
forward improvement, e.g., 

a) The key space could be extended by the orthogonalization operation. In the present 
version the orthogonalization step (c.f. (10) and ([l4])) is assumed to be completely known. 



However, simply by making access to the order of orthogonalization private, the key space 
would be expanded. 

b)The other possibility that can be foreseen to strengthen the encryption algorithm is to 
further scrambling the folded image using a chaos-based encryption algorithm. 



For the above reasons the proposed SCEIF approach appears in our mind a very exiting pos- 
sibility. We feel confident that it will stimulate further work in this direction. 
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A. Construction of Matrices B^^ n = 1, . . . , A: (c.f. ([5])) 

For z = 1, 2, 3 the coefficients c^, n = 1, . . . , in Q should be determined in such a way that 
||i?^||F is minimum for each z. This is ensured by requesting that = Iz — ^Yk^zi ^ = 1^ 2, 3, 
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where Pv^ is the orthogonal projection operator onto Yk = spanji^^^^ The required 

representation of Pv^ is of the form Py^/ = Ylt=i^ri{B^i I)f^ where each An G M^^^^^ is an 
array with the selected atoms = D^a:^'S)D'^y^ and S^, n = 1, . . . , the concomitant reciprocal 
matrices. These are the unique elements of ]R^^><^^ satisfying the conditions: 



i) {A^,Bi)p = S„,r^ 



1 if n = m 
if n 7^ m. 



ii) Vfc = span{S^}^^i. 
Such matrices can be adaptively constructed through the recursion formula [T9|: 



^.+1 ^ 5. _ B'+l{A,^„ S^)p, n = l,...,k, 
where 

k 



BlXl = Wk+i/\\Wk+i\\l, with Wi = Ai and Wk+i = Ak+i - J2 ^^+1^^- 



(A.l) 



n=l " "-^ 



For numerical accuracy in W^, n=l,...,/c + lat least one re-orthogonalization step is usually 
needed. It implies that one needs to recalculate these matrices as 

k T /T/" 

Wk+i = Wk+1 - J2 FtTtV^^- ^k+i)F. (A.2) 

1 F 

n=l " "-^ 

With matrices 5^, n = 1, . . . , constructed as above the required coefficients in ^ are ob- 
tained, for z = 1, 2, 3, from the inner products 



n 1 
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